Automatic Discourse Segmentation using Neural Networks

نویسندگان

  • Rajen Subba
  • Barbara Di Eugenio
چکیده

In example (1), a sentence from a Wall Street Journal article taken from the Penn TreeBank corpus is further segmented into four EDUs, (1a), (1b), (1c) and (1d) (RST, 2002). Discourse segmentation, clearly, is not as easy as sentence boundary detection. The lack of consensus with regards to what constitutes an elementary discourse unit adds to the difficulty. Building a rule based discourse segmenter can be a tedious task since these rules would have to be based on the underlying grammar of the particular parser that is to be used. Therefore, we adopted a neural network model for automatically building a discourse segmenter from an underlying corpus of segmented text. We chose to use part-of-speech tags, syntactic information, discourse cues and punctuation. Our ultimate goal is to build a discourse parser that uses this discourse segmenter. The data that we used to train and test our discourse segmenter is the RST-DT (RST, 2002) corpus. The corpus contains 385 Wall Street Journal articles from the Penn Treebank. The training set consists of 347 articles for a total of 6132 sentences, whilst the test set contains 38 articles for a total of 991 sentences. The RST-DT corpus provides us with pairs of sentences and EDUs. For the syntactic structure of the sentences, we have used both the gold standard Penn Treebank data and syntactic parse trees generated by (Charniak, 2000). As regards the discourse cues, we used a list of 168 possible discourse markers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A multi-scale convolutional neural network for automatic cloud and cloud shadow detection from Gaofen-1 images

The reconstruction of the information contaminated by cloud and cloud shadow is an important step in pre-processing of high-resolution satellite images. The cloud and cloud shadow automatic segmentation could be the first step in the process of reconstructing the information contaminated by cloud and cloud shadow. This stage is a remarkable challenge due to the relatively inefficient performanc...

متن کامل

Neural Network-Based Learning Kernel for Automatic Segmentation of Multiple Sclerosis Lesions on Magnetic Resonance Images

Background: Multiple Sclerosis (MS) is a degenerative disease of central nervous system. MS patients have some dead tissues in their brains called MS lesions. MRI is an imaging technique sensitive to soft tissues such as brain that shows MS lesions as hyper-intense or hypo-intense signals. Since manual segmentation of these lesions is a laborious and time consuming task, automatic segmentation ...

متن کامل

Diagnosis of brain tumor using PNN neural networks

Cells grow and then need a very neat method to create new cells that work properly to maintain the health of the body. When the ability to control the growth of the cells is lost, they are unconsidered and often divided without order. Exemplified cells form a tissue mass called the tumor. In fact, brain tumors are abnormal and uncontrolled cell proliferations. Segmentation methods are used in b...

متن کامل

Automatic Paragraph Segmentation with Lexical and Prosodic Features

As long-form spoken documents become more ubiquitous in everyday life, so does the need for automatic discourse segmentation in spoken language processing tasks. Although previous work has focused on broad topic segmentation, detection of finer-grained discourse units, such as paragraphs, is highly desirable for presenting and analyzing spoken content. To better understand how different aspects...

متن کامل

Sentence Segmentation in Narrative Transcripts from Neuropsycological Tests using Recurrent Convolutional Neural Networks

Automated discourse analysis tools based on Natural Language Processing (NLP) aiming at the diagnosis of languageimpairing dementias generally extract several textual metrics of narrative transcripts. However, the absence of sentence boundary segmentation in the transcripts prevents the direct application of NLP methods which rely on these marks to function properly, such as taggers and parsers...

متن کامل

Sentence Segmentation in Narrative Transcripts from Neuropsychological Tests using Recurrent Convolutional Neural Networks

Automated discourse analysis tools based on Natural Language Processing (NLP) aiming at the diagnosis of languageimpairing dementias generally extract several textual metrics of narrative transcripts. However, the absence of sentence boundary segmentation in the transcripts prevents the direct application of NLP methods which rely on these marks to function properly, such as taggers and parsers...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007